Pronunciation learning for named-entities through crowd-sourcing
نویسندگان
چکیده
Obtaining good pronunciations for named-entities poses a challenge for automated speech recognition because namedentities are diverse in nature and origin, and new entities come up every day. In this paper, we investigate the feasibility of learning named-entity pronunciations using crowd-sourcing. By collecting audio samples from non-linguistic-expert speakers with Mechanical Turk and learning from them, we can quickly derive pronunciations that are more accurate in speech recognition tests than manual pronunciations generated by linguistic experts. Compared to traditional approaches of generating pronunciations, this new approach proves to be cheap, fast, and quite accurate.
منابع مشابه
Learning Pronunciation and Accent from The Crowd
Learning a second language is becoming a more popular trend around the world. But the act of learning another language in a place removed from native speakers is difficult as there is often no one to correct mistakes nor examples to imitate. With the idea of crowd sourcing, we would like to propose an efficient way to learn a second language better.
متن کاملA Fun and Engaging Interface for Crowdsourcing Named Entities
There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus has already been annotated. Annotating corpora for natural language processing tasks is t...
متن کاملWhat did you Mention? A Large Scale Mention Detection Benchmark for Spoken and Written Text
We describe a large, high-quality benchmark for the evaluation of Mention Detection tools. The benchmark contains annotations of both named entities as well as other types of entities, annotated on different types of text, ranging from clean text taken from Wikipedia, to noisy spoken data. The benchmark was built through a highly controlled crowd sourcing process to ensure its quality. We descr...
متن کاملLearning with the Web: Spotting Named Entities on the Intersection of NERD and Machine Learning
Microposts shared on social platforms instantaneously report facts, opinions or emotions. In these posts, entities are often used but they are continuously changing depending on what is currently trending. In such a scenario, recognising these named entities is a challenging task, for which off-theshelf approaches are not well equipped. We propose NERDML, an approach that unifies the benefits o...
متن کاملActive Learning and Crowd-Sourcing for Machine Translation
In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014